测试结果和相关分析

测试环境

机器：IBM机群

数据集：总数8410个恶意软件，25个恶意软件家族

性能指标

Precision（精确度）：Precision in this paper is the fraction of retrieved malware descriptor vector that are relevant to the targeted malware descriptor vector.

Precision = |{relevant malware images}∩{retrieved malware images}| / |{retrieved malware images}|

Recall（响应度）：Recall in this paper is the fraction of the malware images that are relevant to the targeted malware image are successfully retrieved.

Precision = |{relevant malware images}∩{retrieved malware images}| / |{relevant malware images}|

F值度量：precision和recall的调和平均数

Query time（查询时间）：从获取查询要求到返回结果集的时间

实验结果

初级测试：

选用Panda Virus

1.计算特征向量

2.针对查询，根据预构建的哈希表中从恶意软件特征向量的数据库中检索前K个相似符合项

用LSH索引，大概花费了20毫秒选出前10位相似项。这10项具有很强的相似性，故这些变体的查询具有非常小的差异。

升级测试：

样本选取：在每个包含25个恶意软件的集中随机选择8个恶意软件，组成有200个恶意软件样本的查询集。

样本大小：为了估测DMD系统对于数据集规模的适应性，设计了从100～10000个描述向量（8410个恶意软件，1590个良性软件，混淆度为15%）

样本组数：一共十组测试对象，分别采用DELSH和基于Euclidean distance similarity measure的线性暴力。

DELSH参数选择：把ELSH参数（W = 0.7, k = 5）作为在DMD系统中ELSH的灰度图参数

Relevant malware descriptor vectors的获取：选取前十相似项，与真值进行比较得到

获取三个性能指标：把retrievalMI和relevantMI，通过公式得到Precision、recall、F值度量

Precision和recall：precision 和 recall 数值都是DELSH小于暴力，但Precision差距不超过3%，两个性能参数都是随着数据大小变大而变小

Query time

Force与time成线性关系

DELSH与time成非线性

DELSH比Force快近十倍，并且这个差距随着数据集的增大而增大。

[Paper Notes] Experimental Result And Analysis——云计算下基于二进制文件特点的恶意软件分布式侦测

04-Distributed Malware Detection based on Binary File Features in Cloud computing Environment（DMDsystem）